Searching in a Dvi File
نویسنده
چکیده
Most, if not all, DVI previewers and printer drivers provide a facility for selecting a subset of the pages of a document; this subset is specified using the contents of the \count0 to \count9 registers that w outputs to identify each page of the file. This makes it easy to preview just pages 7, 8 and 9, but what if you know you want to look at the page with the paragraph about Katzenellenbogen by the Sea? If you're not sure how the page makeup worked out, you won't know where that is. Trial and error will find the right page sooner or later, but it would be more convenient if there was a facility for selecting a page by its content, that is, the occurrence on it of a particular string. Many efficient string searching algorithms already exist; they are used routinely in text editors and other programs. These algorithms take as their input a string of characters the target and a pattern. The pattern specifies a set of strings. The task of the searching algorithm is to find the location, if any, within the target of a substring that belongs to the set specified by the pattern. The pattern may simply be a single string, specifying itself, or it may use metacharacters and some formalism such as regular expressions to specify a larger set of strings. In general, the more elaborate the language permitted for specifying patterns, the more elaborate the search algorithm will be. There are well known efficient algorithms for searching for single strings [4] and for sets specified by regular expressions [I]. A DVI file is a sequence of typesetting commands, some of which may have parameters. (DVI commands are fully described in [3,$§583-5901.) A user specifying a pattern to search for will want to type that pattern at a terminal using the subset of ASCII that corresponds to printable characters. Thus, before one of the standard string searching algorithms can be employed, either the pattern must be converted to a sequence of DVI commands, the DVI file must be mapped into a string of printable ASCII characters, or both DVI file and pattern must be mapped into some other common representation. Leaving aside the possibility of using anything more elaborate than simple strings as patterns. a possible approach based on the first of these options is to use TEX to convert the pattern into DVI. Using this approach, it would be possible for patterns to be specified in the w language, and thus to make use of macros and to carry out searches on all features of a document, including math mode material and even rules and spaces. Search patterns could be extracted directly from the TJ$ source of a document. (In fact. for non-trivial search strings they would probably have to be, because of the difficulty of deciding exactly what TJ$ commands produced some particular output.) However, the problems of interfacing TF$ are considerable, and the overheads of running it to process every search string are unlikely to be acceptable. Furthermore, the actual DVI produced by for a particular string will depend on the context in which that string appears. Such elements as interword spacing, line breaking and hyphenation may be very different when the string appears in the middle of a paragraph and when it is typeset in isolation. Thus, even after a pattern was converted to DVI, it would not be possible to apply simple string matching: some sort of fuzzy matching would be necessary. If converting the pattern to DVI is problematical, what about converting the DVI file to ASCII? This is essentially the same task as that performed by DVI previewers for dumb terminals, and it suffers from the same limitations: only text material can be dealt with properly, and spacing must be approximated. It has the great virtue of being simple, and, once the transformation has been done, any string matching algorithm can be used, including those that support regular expressions. This approach will be looked at further in the next section. Finally, there is the possibility of converting both the DVI file and the pattern to some common representation. The obvious choice here is the extended character code set used by w to specify characters in its math symbol and extension fonts. This requires some means of specifying characters in the pattern other than printable ASCII characters: the obvious way of doing this is by permitting a suitable subset of TEX commands to be used in patterns. Matching can then be done on text and math mode material. This approach is further described in section 3.
منابع مشابه
Hacking DVI files: Birth of DVIasm
This paper is devoted to the first step of developing a new DVI editing utility, called DVIasm. Editing DVI files consists of three parts: disassembling, editing, and assembling. DVIasm disassembles a DVI file to a human-readable text format (more flexible than DTL), and assembles the output back to a DVI file. DVIasm is useful for people who have a DVI file without TEX source, but need to modi...
متن کاملDEVICE - INDEPENDENT FILE FORMAT DVI type changes for
13. Device-independent file format. Before we get into the details of DVItype, we need to know exactly what DVI files are. The form of such files was designed by David R. Fuchs in 1979. Almost any reasonable typesetting device can be driven by a program that takes DVI files as input, and dozens of such DVI-to-whatever programs have been written. Thus, it is possible to print the output of docum...
متن کاملTEX to HTML Translation via Tagged DVI
This paper describes dvihtml, a program under development for translating a tagged DVI file into HTML. A common problem when translating TEX into another format is handling unexpected macros. Fortunately, TEX’s macro language is flexible enough to pass markup information to the DVI file in the form of \special’s, fonts and small horizontal or vertical movements. Translating the resulting DVI fi...
متن کاملspecials for PDF generation
DVIPDFM(x) manages various PDF effects by means of DVI specials. Appropriate documentation of DVI specials, however, is not easy to find, and exact functionality is not simple to catch without reading the source code of DVI drivers. This paper deals with the DVI specials defined in DVIPDFM(x) that are mainly used for PDF generation. We discuss the features of those specials with some examples, ...
متن کاملA Device Independent DVI Interpreter Library for Various Output Devices
In this paper, we describe DVIlib, which is a device independent DVI interpreter library written in C developed by the author. Since DVIlib is completely independent from specific output devices, new printer drivers and previewers (DVIware) can be easily developed. DVIlib is a set of functions to read and render DVI files. To render a page, DVIlib generates a bitmap for each character in a page...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011